National Repository of Grey Literature 12 records found  1 - 10next  jump to record: Search took 0.01 seconds. 
Automatic post-editing of phrase-based machine translation outputs
Rosa, Rudolf ; Mareček, David (advisor) ; Žabokrtský, Zdeněk (referee)
We present Depfix, a system for automatic post-editing of phrase-based English-to-Czech machine trans- lation outputs, based on linguistic knowledge. First, we analyzed the types of errors that a typical machine translation system makes. Then, we created a set of rules and a statistical component that correct errors that are common or serious and can have a potential to be corrected by our approach. We use a range of natural language processing tools to provide us with analyses of the input sentences. Moreover, we reimple- mented the dependency parser and adapted it in several ways to parsing of statistical machine translation outputs. We performed both automatic and manual evaluations which confirmed that our system improves the quality of the translations.
Exploring Higher Order Dependency Parsers
Madhyastha, Pranava Swaroop ; Zeman, Daniel (advisor) ; Ramasamy, Loganathan (referee)
Most of the recent efficient algorithms for dependency parsing work by factoring the dependency trees. In most of these approaches, the parser loses much of the contextual information during the process of factorization. There have been approaches to build higher order dependency parsers - second order, [Carreras2007] and third order [Koo and Collins2010]. In the thesis, the approach by Koo and Collins should be further exploited in one or more ways. Possible directions of further exploitation include but are not limited to: investigating possibilities of extension of the approach to non-projective parsing; integrating labeled parsing; joining word-senses during the parsing phase [Eisner2000]
Exploring Higher Order Dependency Parsers
Madhyastha, Pranava Swaroop ; Zeman, Daniel (advisor) ; Mareček, David (referee)
Most of the recent efficient algorithms for dependency parsing work by factoring the dependency trees. In most of these approaches, the parser loses much of the contextual information during the process of factorization. There have been approaches to build higher order dependency parsers - second order, [Carreras2007] and third order [Koo and Collins2010]. In the thesis, the approach by Koo and Collins should be further exploited in one or more ways. Possible directions of further exploitation include but are not limited to: investigating possibilities of extension of the approach to non-projective parsing; integrating labeled parsing; joining word-senses during the parsing phase [Eisner2000].
Detecting semantic relations in texts and their integration with external data resources
Kríž, Vincent ; Vidová Hladká, Barbora (advisor)
We present a strategy to automate the extraction of semantic relations from texts. Both machine learning and rule-based techniques are investigated and the impact of different linguistic knowledge is analyzed for the various approaches. To implement the extraction system RExtractor, several natural language processing tools have been improved: from sentence splitting and tokenization modules to dependency syntax parsers. Furthermore, we created the Czech Legal Text Treebank with several layers of linguistic annotation, which is used to train and test each stage of the proposed system. As a result of the performed work, new Semantic Web resources and tools are available for automatic processing of texts.
Detecting semantic relations in texts and their integration with external data resources
Kríž, Vincent ; Vidová Hladká, Barbora (advisor) ; Harašta, Jakub (referee) ; Pecina, Pavel (referee)
We present a strategy to automate the extraction of semantic relations from texts. Both machine learning and rule-based techniques are investigated and the impact of different linguistic knowledge is analyzed for the various approaches. To implement the extraction system RExtractor, several natural language processing tools have been improved: from sentence splitting and tokenization modules to dependency syntax parsers. Furthermore, we created the Czech Legal Text Treebank with several layers of linguistic annotation, which is used to train and test each stage of the proposed system. As a result of the performed work, new Semantic Web resources and tools are available for automatic processing of texts.
Detecting semantic relations in texts and their integration with external data resources
Kríž, Vincent ; Vidová Hladká, Barbora (advisor)
We present a strategy to automate the extraction of semantic relations from texts. Both machine learning and rule-based techniques are investigated and the impact of different linguistic knowledge is analyzed for the various approaches. To implement the extraction system RExtractor, several natural language processing tools have been improved: from sentence splitting and tokenization modules to dependency syntax parsers. Furthermore, we created the Czech Legal Text Treebank with several layers of linguistic annotation, which is used to train and test each stage of the proposed system. As a result of the performed work, new Semantic Web resources and tools are available for automatic processing of texts.
Tvorba závislostního korpusu pro jorubštinu s využitím paralelních dat
Oluokun, Adedayo ; Zeman, Daniel (advisor) ; Rosa, Rudolf (referee)
The goal of this thesis is to create a dependency treebank for Yorùbá, a language with very little pre-existing machine-readable resources. The treebank follows the Universal Dependencies (UD) annotation standard, certain language-specific guidelines for Yorùbá were specified. Known techniques for porting resources from resource-rich languages were tested, in particular projection of annotation across parallel bilingual data. Manual annotation is not the main focus of this thesis; nevertheless, a small portion of the data was verified manually in order to evaluate the annotation quality. Also, a model was trained on the manual annotation using UDPipe.
Syntaktická analýza textů se střídáním kódů
Ravishankar, Vinit ; Zeman, Daniel (advisor) ; Mareček, David (referee)
(English) Vinit Ravishankar July 2018 The aim of this thesis is twofold; first, we attempt to dependency parse existing code-switched corpora, solely by training on monolingual dependency treebanks. In an attempt to do so, we design a dependency parser and ex- periment with a variety of methods to improve upon the baseline established by raw training on monolingual treebanks: these methods range from treebank modification to network modification. On this task, we obtain state-of-the- art results for most evaluation criteria on the task for our evaluation language pairs: Hindi/English and Komi/Russian. We beat our own baselines by a sig- nificant margin, whilst simultaneously beating most scores on similar tasks in the literature. The second part of the thesis involves introducing the relatively understudied task of predicting code-switching points in a monolingual utter- ance; we provide several architectures that attempt to do so, and provide one of them as our baseline, in the hopes that it should continue as a state-of-the-art in future tasks. 1
Discovering the structure of natural language sentences by semi-supervised methods
Rosa, Rudolf ; Žabokrtský, Zdeněk (advisor) ; Tiedemann, Jörg (referee) ; Horák, Aleš (referee)
Discovering the structure of natural language sentences by semi-supervised methods Rudolf Rosa In this thesis, we focus on the problem of automatically syntactically ana- lyzing a language for which there is no syntactically annotated training data. We explore several methods for cross-lingual transfer of syntactic as well as morphological annotation, ultimately based on utilization of bilingual or multi- lingual sentence-aligned corpora and machine translation approaches. We pay particular attention to automatic estimation of the appropriateness of a source language for the analysis of a given target language, devising a novel measure based on the similarity of part-of-speech sequences frequent in the languages. The effectiveness of the presented methods has been confirmed by experiments conducted both by us as well as independently by other respectable researchers. 1
Automatic post-editing of phrase-based machine translation outputs
Rosa, Rudolf ; Mareček, David (advisor) ; Žabokrtský, Zdeněk (referee)
We present Depfix, a system for automatic post-editing of phrase-based English-to-Czech machine trans- lation outputs, based on linguistic knowledge. First, we analyzed the types of errors that a typical machine translation system makes. Then, we created a set of rules and a statistical component that correct errors that are common or serious and can have a potential to be corrected by our approach. We use a range of natural language processing tools to provide us with analyses of the input sentences. Moreover, we reimple- mented the dependency parser and adapted it in several ways to parsing of statistical machine translation outputs. We performed both automatic and manual evaluations which confirmed that our system improves the quality of the translations.

National Repository of Grey Literature : 12 records found   1 - 10next  jump to record:
Interested in being notified about new results for this query?
Subscribe to the RSS feed.